Assessing the emergence time of SARS-CoV-2 zoonotic spillover

Understanding the evolution of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-2) and its relationship to other coronaviruses in the wild is crucial for preventing future virus outbreaks. While the origin of the SARS-CoV-2 pandemic remains uncertain, mounting evidence suggests the direct involvement of the bat and pangolin coronaviruses in the evolution of the SARS-CoV-2 genome. To unravel the early days of a probable zoonotic spillover event, we analyzed genomic data from various coronavirus strains from both human and wild hosts. Bayesian phylogenetic analysis was performed using multiple datasets, using strict and relaxed clock evolutionary models to estimate the occurrence times of key speciation, gene transfer, and recombination events affecting the evolution of SARS-CoV-2 and its closest relatives. We found strong evidence supporting the presence of temporal structure in datasets containing SARS-CoV-2 variants, enabling us to estimate the time of SARS-CoV-2 zoonotic spillover between August and early October 2019. In contrast, datasets without SARS-CoV-2 variants provided mixed results in terms of temporal structure. However, they allowed us to establish that the presence of a statistically robust clade in the phylogenies of gene S and its receptor-binding (RBD) domain, including two bat (BANAL) and two Guangdong pangolin coronaviruses (CoVs), is due to the horizontal gene transfer of this gene from the bat CoV to the pangolin CoV that occurred in the middle of 2018. Importantly, this clade is closely located to SARS-CoV-2 in both phylogenies. This phylogenetic proximity had been explained by an RBD gene transfer from the Guangdong pangolin CoV to a very recent ancestor of SARS-CoV-2 in some earlier works in the field before the BANAL coronaviruses were discovered. Overall, our study provides valuable insights into the timeline and evolutionary dynamics of the SARS-CoV-2 pandemic.


Introduction
The emergence of SARS-CoV-2, a human coronavirus first detected in Wuhan (China) in 2019, has significantly impacted worldwide health, economic, and social landscapes [1].Despite access to early viral sequences and the high interest of the scientific community, the precise chain of zoonotic transmissions leading to the emergence of SARS-CoV-2 remains unknown.
While no consensus has been reached here, numerous recent studies point at pangolins and bats as the most likely natural betacoronavirus reservoirs [2].According to this hypothesis, a spillover event involving bat coronaviruses is the likely cause of the pandemic, with pangolins being possible intermediate hosts [3,4].Concerns also emerge from the observation of humanto-animal transmission, suggesting that SARS-CoV-2 and other coronaviruses with pandemic potential could easily adapt to new hosts [5].For example, it is well known that coronaviruses have the potential to infect a wide range of domestic and wild mammals [6].
This scenario is of particular interest since the spike (S) protein and its receptor-binding domain (RBD) are essential for the process of cellular entry [7].The RBD, contained within the S1 subunit of protein S, interacts with the angiotensin-converting enzyme 2 (ACE2) receptor present on the surface of human cells, and thus facilitates the attachment and subsequent fusion of the viral envelope with the host cell membrane [8].This process is critical for viral infection, and as such, viral RBDs generally show high mutation rates.This is due to the selective pressure associated with the affinity between the RBD and ACE2 regions as well as to the virus ability to evade immune responses [8].Higher rates of recombination are characteristic for the spike gene subunit S1, where the RBD is located, compared to the subunit S2 [9,10].Thus, gene S and RBD sequences are of particular importance for zoonotic studies [11].
Early works have identified the horseshoe bat (Rhinolophus affinis) coronavirus, RaTG13, as the most similar to SARS-CoV-2 with 96% of genomic similarity [12,13].However, the receptor-binding domain of the spike protein of SARS-CoV-2 shares a higher similarity with Malayan pangolin (Manis javanica) coronaviruses, found in the Guangdong province of China [2,4,14], as compared to RaTG13.More recently, bat coronavirus genomes sampled in Laos, and identified as BANAL-52, -103, and -236, have been found to share even higher degree of similarity with the SARS-CoV-2 whole genome, gene S, and RBD sequences [15,16].The presence of mosaic genes in the SARS-CoV-2 genome suggests that multiple horizontal gene transfer and recombination events have affected the evolutionary history of betacoronavirus organisms [9,17].
Many SARS-CoV-2 sequences were collected at the beginning of the pandemic, as well as in the following months and years.These time-stamped sequences and their assignation to specific strains provide a clearer picture of the evolution of this virus in its new host.Using these data as well as genome sequences of various betacoronaviruses, it is possible to reconstruct the early days of zoonosis and deduce a probable timeline of the pandemic.Indeed, evolutionary patterns over time across these sequences can be thought of as a temporal structure [18].This temporal structure enables the reconstruction of accurate phylogenetic trees, providing insights into the timing of important evolutionary events [19].For example, it allows one to estimate the dating of the divergence events between different lineages.
Root-to-tip regressions are an informal way to investigate the presence of temporal signals in a heterochronous dataset [18].This technique employs a rooted molecular phylogeny, whose branch lengths represent the genetic distances and used alongside regressions between the tree tips and the root as a function of their sampling time, to estimate the evolutionary rate (represented as a slope).The intercept with the abscissa indicates the time of origin and the squared Pearson's correlation coefficient (R 2 ) indicates the clocklike behavior.
A more formal assessment of temporal structure in a dataset can be achieved using Bayesian Evaluation of Temporal Signal (BETS) analysis [20].This analysis is conducted by comparing the statistical fit of Bayesian models that may include or not include temporal information (e.g.sampling dates).If the model containing temporal information has a better statistical fit than the model without it, this indicates that the dataset contains a measurable evolving population, suggesting that there exists a statistically meaningful number of genetic differences between the sequences collected over time [21,22].The statistical fit of a model is expressed through its marginal likelihood.The ratio of the marginal likelihood of two competing models (with and without temporal information) is used to calculate a Bayes factor representing the model's statistical support and allowing one to select the best model overall for the data at hand.Models containing a temporal structure are usually better suited for producing reliable divergence time estimates [23].
A transmission bottleneck is a phenomenon linked to genetic drift that occurs when a virus gets transmitted to a new host population that was not previously accessible through a specific event such as zoonosis [24].This bottleneck limits the shared genetic diversity between viral populations at either end [25].Studying the phylodynamics of zoonotic spillover events from a natural reservoir into a human population and identifying the most recent common ancestors are crucial steps for understanding and preventing future spillovers [26].Such analyses, focusing on the timeline of zoonosis, can be very challenging in case of high sequence evolutionary rates and sparse collection of dated samples.
The emergence and rapid spread of SARS-CoV-2 have highlighted the importance of understanding the exact virus origin, transmission dynamics, and main evolutionary patterns.While numerous evolutionary studies have provided valuable insights on the emergence and phylogenetic relationships between SARS-CoV-2 and its close relatives, several research gaps regarding the emergence time and the importance of some evolutionary events such as speciation, horizontal gene transfer, intergenic and intragenic recombination, gene duplication, and gene loss, still need to be addressed.In order to study these evolutionary events comprehensively, phylogenetic studies should be conducted not only on the whole genome coronavirus sequences, but on the individual coronavirus gene sequences as well.By employing Bayesian phylogenetics and the BETS analysis, the timing of the most important clade divergence events of different coronavirus phylogenies will be assessed, including the inference of the precise timing of SARS-CoV-2 zoonotic spillover.The main findings of this study can provide insights into phylogenetic relationships between different betacoronavirus lineages and lead to some interesting new discoveries and interpretations.The applied methodology can be beneficial for identifying the timing of future possible virus outbreaks.
In this study, we will analyze datasets that cover both evolutionary ends (the host reservoir and the final spillover results) to better understand the event itself.As such, our first set of data consists mostly of human SARS-CoV-2 strains in addition to their closest ancestors such as the bat BANAL and RaTG13 CoVs as well as the Guangdong pangolin CoV (sampled in the Guangdong province of China).It provides a snapshot of the SARS-CoV evolution after the transmission event.Our second set of data consists of coronaviruses found in wild bats and pangolins, and a single human strain (i.e. the SARS-CoV-2 reference genome).The second dataset represents the evolutionary dynamics of coronaviruses in their natural reservoir.Our first objective is to validate the presence of temporal structure inherent to the considered gene and genome coronavirus sequences, encompassing both sides of the transmission bottleneck.The tip-to-root regressions will be carried out for this purpose (see the Methods section for more details).Upon this validation, a Bayesian phylogenetic analysis will be conducted to precisely assess the SARS-CoV-2 spillover date along with the dates of the main events that marked the evolution of its most recent ancestors and descendants.Whole genome, gene S, and RBD phylogenies will be inferred and studies in detail.Our findings will be compared to those found in the literature, providing insights into the timeline and dynamics of betacoronavirus evolution before and after the emergence of SARS-CoV-2.

Results
Similar to most viruses, the evolutionary history of SARS-CoV-2 is closely related to its affinity to bind to the host cells and to replicate inside them.To explore adequately the evolution of SARS-CoV-2, the sequences corresponding to the spike (S) gene and its RBD have been extracted and used to generate a total of 6 datasets (i.e. 6 multiple sequence alignments).
Considering these datasets, we performed root-to-tip regressions using the TempEst software [18].For each dataset, we found the corresponding R 2 values, which ranged between 0.16 and 0.40 (see Fig 1).The regression results presented in this figure account for the degree of clocklike behavior.The slope of the regression, representing the number of substitutions/site/ year, varied between 2.1x10 -2 and 8.0x10 -3 for the dataset containing human variants (see Fig 1A , 1C and 1E).The datasets without human variants showed variations between 4.22x10 -2 and 4.32x10 -3 .It is worth noting that other studies have estimated the average substitution rate of the SARS-CoV-2 genome at around 8.9 × 10 −4 [27] and 6.677 × 10 −4 [28].The increased mutation rate found in our datasets can be explained by the inclusion of coronavirus strains from different host species.Such a host diversity leads to large differences observed for the age of the most recent common ancestor (represented by the x-intercept).While a root-to-tip regression cannot be used as a formal determination of temporal signal, it is nonetheless a useful tool for validating the degree of temporal signal in heterochronous sequences prior to applying Bayesian clock models [18].
Following the root-to-tip regression analysis of the whole genome, gene S, and RBD data, a Bayesian evaluation of temporal signal (BETS) was carried out to estimate the variants divergence time (see Fig 2).Two molecular clock models were tested: (1) The strict clock model that assumes a constant rate of evolution over the entire tree history, and (2) The relaxed lognormal clock model that allows a different rate of evolution across branches, following a lognormal probability distribution.In both cases, a coalescent Bayesian skyline tree prior was selected to account for population fluctuations over time [29,30].
Marginal likelihood results for the three datasets (whole genome, gene S, and RBD) containing SARS-CoV-2 variants indicate a strong support in favor of the presence of a temporal structure in the data (Fig 2 ; see the results for data with SARS-CoV-2 variants).In every case, the relaxed lognormal clock presented the strongest support relative to the strict clock, although all models tested using the dataset containing the SARS-CoV-2 variants showed evidence of temporal structure.
On the contrary, the datasets without SARS-CoV-2 variants (Fig 2 ; see the results for data without SARS-CoV-2 variants) provided mixed and attenuated results.We could observe negligible support in favor of temporal structure in the whole genome and RBD models with a relaxed lognormal clock, whereas the RBD model with a strict clock showed barely any support against temporal structure.Interestingly, the gene S dataset with a relaxed lognormal clock showed strong support against temporal structure, while the results of the strict clock model applied to the whole genome and the gene S were also against the presence of temporal structure in the data.
Using the TreeAnnotator software [31], the Maximum Clade Credibility (MCC) tree of the best-performing model has been generated for each dataset (see Figs 3-5).Posterior probability values for the major clades of the six phylogenies presented in Figs 3-5 have been indicated near the corresponding tree branches.These probabilities represent the statistical robustness of the corresponding internal tree branches [32].We can notice that most of the main clades of the presented betacoronavirus phylogenies are very robust as the associated posterior probabilities are equal or close to 1.The three datasets without SARS-CoV-2 variants produced informative phylogenies (see Figs 3A, 4A and 5A) that are consistent with previous studies [16,33].Furthermore, the two MCC trees obtained with the whole genomes have similar trends with previously established SARS-CoV-2 phylogenies [33]  As was expected, the phylogeny representing the evolution of RBD (see Fig 5) indicates a much faster evolutionary rate, as reflected by its compressed evolutionary timeline.While the previous tree roots for the human variant datasets date back to 1915 for the whole genome tree Using the created time-calibrated phylogenetic trees, we estimated the confidence intervals for The Most Recent Common Ancestor (TMRCA) of SARS-CoV-2.This analysis allowed us to give a more nuanced probable timeframe regarding the time of emergence of the SARS--CoV-2 spillover event.As such, the confidence intervals presented in Fig 6 show that the TMRCA time intervals obtained using the whole genome, gene S, and RBD datasets not containing the SARS-CoV-2 variant sequences are significantly wider than their counterparts obtained using the SARS-CoV-2 variants.
The three datasets containing SARS-CoV-2 variants allowed us to estimate the median divergence time (i.e. the SARS-CoV-2 spillover time) between August and early October 2019.The whole genome estimate of early October 2019 also includes the narrowest 95% confidence interval (August 2019 to November 2019), while the RBD estimate corresponds to the widest confidence interval (February 2019 to November 2019, with the median estimate of August 2019).In contrast, the results without SARS-Cov-2 variants suggest that the median divergence time from the wild strains ranges in the period between July and September 2019, with the farthest estimates going back to September 2017.

Discussion
The consideration of temporality within a heterochronous dataset is a necessary element to obtain pertinent time divergence estimations [18].Starting with a null hypothesis (H 0 ) assuming the absence of temporality in the data, we have attempted to reject it for the alternative hypothesis (H 1 ) favoring the presence of temporal signal in the genetic sequences.While the dataset containing a variety of coronavirus sequences distantly related to SARS-CoV-2 had a positive early signal of temporal structure according to the root-to-tip analysis (see Fig 1), the H 0 hypothesis could only be partially disproved by the results of the BETS analysis (see Fig 2).Furthermore, gene S sequences without human SARS-CoV-2 variants showed a strong signal against the temporality in the data, with both the strict and relaxed lognormal clock models.Gene S is a key genetic element in overcoming interspecies transmission, and as such, it faces significant evolutionary pressure [34].Thus, the lack of temporal structure found for gene S sequences indicates that further research and collection of evolutionary significant dated sequences are still required to adequately retrace the evolutionary history of SARS-CoV-2 and its close relatives.Moreover, the whole genome and the RBD datasets without SARS-CoV-2 variants both failed to demonstrate clear temporal signals through their BETS analysis using the model with a strict molecular clock, while the models with a relaxed lognormal clock showed only a slight support toward the presence of temporal structure.While this level of temporal structure support is much lower than the support obtained for the datasets containing SARS-CoV-2 variants, our results suggest that the approach of dividing the collected sequences into two distinct datasets representing both sides of the evolutionary bottleneck has the potential to improve our understanding of the evolutionary history of zoonosis events.
Datasets containing SARS-CoV-2 variants displayed a significantly higher degree of temporal structure, with the whole genome dataset presenting the strongest evidence of temporality.The main conclusion we can draw by observing our temporality results is that the emergence of SARS-CoV-2 took place between August and October 2019.Our estimation is generally in agreement with the SARS-CoV-2 emergence period assessments provided by other researchers.For instance, Roberts et al. [35] used a well-established extinction estimator approach, called Optimal Linear Estimation (OLE), to discover that the first COVID-19 case occurred between October 4 th 2019 and November 17 th 2019 [35].Furthermore, Worobey et al. conducted spatial and mobility analyses of early COVID-19 cases to estimate that the SARS-CoV-2 spillover event occurred in November 2019 [36].Previous studies using phylogenetic inference and epidemiologic simulations, focusing on the SARS-CoV-2 lineage A and B, have identified the probable spillover time being between late October 2019 and mid-December 2019 [37].Interestingly, the officially accepted SARS-CoV-2 emergence data early of December 2019 SARS-CoV-2 [38] falls just outside of our 95% highest posterior density (HPD) confidence intervals (see Fig 6).
The phylogenetic trees containing SARS-CoV-2 variants show a significant variation over root dates.The gene S and the whole genome phylogenies are rooted with nodes dating back to 1907 and to 1915, respectively, while the RBD phylogenies are rooted with nodes dating back to 2009 (for data with variants) and to 1997 (for data without variants).This difference in rooting time for the RBD phylogenies is likely due to its very rapid evolutionary rates, which are also unequal for different lineages, and multiple recombination events affecting the coronavirus RBD [17,39].
Our investigation of the evolutionary history of SARS-CoV-2 includes not only the estimation of timing of its zoonotic spillover but also covers the extended period of coronavirus circulation within non-human hosts preceding the pandemic.The inferred divergence times (see Figs 3-5) suggest that the virus had a prolonged existence in its natural reservoirs, i.e. bats and pangolins, before the emergence of SARS-CoV-2.Notably, the pangolin coronavirus sequences, including Guangxi-P5E, -P2V, -P5L, -P1E, and -P4L, demonstrate a clear separation from SARS-CoV-2 and its closely related bat counterparts due to a speciation event which occurred between the end of 2012 (see Fig 5A) and the end of 2014 (see Figs 3A and 4A).Moreover, the presence of the highest tree clade including two BANAL species (BANAL-20-236 and 20-103) and the two Guangdong pangolin coronaviruses (MP789 and Guangdong-1) in the phylogenies of gene S and RBD (see Figs 4A and 5A, respectively) suggests that gene S of the Guangdong pangolin coronaviruses was probably affected by a horizontal gene transfer (stemming from the above-mentioned BANAL species) and recombination event that took place around the middle of 2018.This event accounts for genetic resemblance between the RBD of the SARS-CoV-2 and Guangdong pangolin coronaviruses, which had been explained by gene transfer from Guangdong pangolin coronaviruses to the ancestor of SARS-Cov-2 in some earlier works in the field [4,14,17,40], before the BANAL coronaviruses were discovered.It is worth noting that the phylogenies and the timings inferred (see Figs 3-5) are in agreement with some recent studies discussing the role of Rhinolophus bats (BANAL-20-52,-103, and 256) in the evolution of SARS-CoV-2 [16,41].
Our study has a few limitations.First, the phylogenies inferred using the Bayesian approach cannot be used to represent horizontal gene transfer and recombination events which have occurred during the evolutionary timeline under study.Such recombination events have been shown to affect the tree topologies, and thus could influence the TMRCA estimates as well [9,10].Their inclusion in a future study could offer a more nuanced and complex explanation of the evolution of SARS-CoV-2 and the related betacoronaviruses.However, conducting such an analysis could be a very challenging task.In order to take into account horizontal gene transfer and recombination events, one should first detect all mosaic regions of a given multiple sequence alignment, then remove these regions from the sequences and realign them.Another possible option for taking into account recombination among coronaviruses consists in the adaptation of the BETS analysis [20] to phylogenetic networks taking into account horizontal gene transfer and both intra-and inter-genic recombination events [42,43].
Obviously, the mosaic nature of the SARS-Cov-2 genome needs to be investigated in more detail.For example, Ul-Rahman et al. [6] conducted a phylogenetic analysis of various betacoronavirus strains of human and non-human mammalian hosts (e.g.pangolins, bat, dog, tiger, mink and mouse) and identified a close relationship between coronavirus sequences, suggesting a likely evolution from a common ancestor and thus a non-mosaic nature of the SARS--CoV-2 genome.Ul-Rahman et al. did not conduct the temporality analysis of these coronavirus strains.However, the presence of mosaic genes in the SARS-CoV-2 genome was suggested in some later studies in the field [9,17].
Furthermore, different coronaviruses are known to evolve at different evolutionary rates [3].This phenomenon has been taken into account using a relaxed clock model while conducting the Bayesian phylogenetic analysis.This model allowed us to consider a variation in the rate of evolution across branches.Such an approach works well for sequences with high similarity but may fail to adequately show large changes in substitution rates which could occur in the wild.For instance, it has been observed that the between-lineage rate of the SARS-CoV-2 phylogenies is much higher than that of the within-lineage rate [44,45].These limitations will be addressed in our future studies.

Conclusion
We established that the SARS-CoV-2 spillover event most likely occurred between August 2019 and October 2019.Our results are generally consistent across all models generated using BEAST2 and supported by the literature [35].Moreover, we found that the presence of a statistically robust clade in the phylogenies of gene S and RBD, including two BANAL and two Guangdong pangolin coronaviruses and closely located to SARS-CoV-2, is most probably due to the horizontal gene transfer of gene S from BANALs to Guangdong pangolin coronaviruses that occurred in the middle of 2018.The presented methodology can be applied to determine the timing of other possible spillover events, such as plant viruses infecting new species, which are likely to occur as a consequence of climate change and simplification of the ecosystems [46,47].This could ultimately help in mitigating their prevalence by identifying the underlying factors leading to the spillover events.

Genetic data and multiple sequence alignments
Following a comprehensive review of the most frequent betacoronavirus organisms found in bats and pangolins, the genome sequences used in our study (see S1 and S2 Tables) were downloaded from the GISAID and Genbank databases, including those cited in previous works in the field [16,17].The human variants were selected to represent the lineages of interest according to the World Health Organization (WHO) and the Centers for Disease Control (CDC).
The first dataset (with SARS-CoV-2 variants) contained triplicated genomic sequences corresponding to 16 different SARS-CoV-2 variants, as well as the SARS-CoV-2 reference genome (Wuhan-1) and 7 sequences showing close genetic similarity with the human strains (5 from bats and 2 from pangolins).The second dataset contained 22 sequences of different bat and pangolin coronaviruses as well as the SARS-CoV-2 reference genome.
For each dataset, the sequences were aligned using the MUSCLE v5.1 algorithm [48] from the MEGA-X program [49] with the default parameters.The gene S and the RBD sequences were extracted from the whole genome alignments using the SARS-CoV-2 reference genome annotations as reference, and then realigned separately.Large gaps in all alignments were removed using the Gblocks tool (version 0.91b) from the phylogeny.frweb server [50].

Bayesian phylogenetic analysis
The dates of the main speciation events in the phylogenies shown in Figs 3-5 were calculated using the BEAST v2.7.5 software [51].For each model, we ran three sets of computations, each consisting of 2 x 10 7 steps.The three sets of the results obtained were then combined using the LogCombiner v2.6.7 program [31].This was necessary for ensuring the convergence of the independent Markov chain Monte Carlo (MCMC) model and providing more robust parameter estimates.For each combined set of results, we verified that the effective sampling size of key parameters was over 200, as recommended by [52].For each model, its most important parameters, including the clock model, the site model, and the tree priors are reported in S3 and S4 Tables.
For each pair of models, containing or not the temporal data, a BETS analysis was conducted to evaluate the strength of the temporal signal.The marginal likelihood of each model was obtained using generalized stepping-stone sampling [53], and subsequently used to compute the corresponding Bayes factors.The qualitative interpretation of the obtained Bayes factors, used to support or to refute a hypothesis, was done according to the Kass-Raftery scale [54].Both the marginal likelihood estimate and the Bayes factors are reported in S2 Table.
The tree topologies issued from the Bayesian analysis have been summarized using the Maximum Clade Credibility (MCC) method available in the TreeAnnotator v2.6.4 program [31].Appropriate scaling factors for phylogenetic trees have been selected using the three datasets without SARS-CoV-2 variants.Since the branch lengths of a given tree represent the mean number of substitutions per site that have occurred along them [55], we used scaling factors for assessing the mean substitution rates with the 95% HPD confidence intervals.The scaling factors used for the whole genome, gene S and RBD phylogenetic trees without SARS-CoV-2 variants, were, respectively, 1 x 10 −3 , 8 x 10 −4 and 1 x 10 −1 .The TMRCA times and their 95% HPD confidence intervals have been computed using Figtree v1.4.4 [56].

Root-to-tip regressions
Maximum-likelihood phylogenetic trees have been inferred using the program IQ-TREE v2.2 [57] with an optimal substitution model chosen by the software (see S5 Table ).These phylogenetic trees have been used as input [58], along with the sampling dates of all genome or gene sequences, of the Tempest v1.5.3 program in which the best-fitting root parameters were used [18].

Fig 1 .
Fig 1. Root-to-tip regression analysis of the six selected datasets.The ordinate represents the number of substitutions per site each year.Each plot shows the regression of the genomic distances of the sequences against their sampling times.Each plot contains the slope of the regression, the intercept with the x-axis (x-intercept), and the R 2 value associated with the regression.The three plots on the left (A, C, E) correspond to the datasets comprising the SARS-CoV-2 human variants, whereas the three plots on the right (B, D, F) correspond to the datasets without human variants.https://doi.org/10.1371/journal.pone.0301195.g001

Fig 2 .
Fig 2. Bayesian Evaluation of Temporal Signal (BETS) analysis results for the six selected datasets.For each dataset, the Bayes factor corresponding to the UCLN (relaxed lognormal) and the strict clock models are shown.The Bayes factor represents the difference in fit between a model containing temporal data and a model that does not contain it.A positive value supports the hypothesis of the presence of temporal structure in a dataset, while a negative value does not support this hypothesis.A Bayes factor value between 0 and 3 indicates that barely any support has been found, a value between 3 and 20 indicates a positive support, and a value over 20 indicates a strong support [54].https://doi.org/10.1371/journal.pone.0301195.g002

Fig 3 .
Fig 3. Maximum clade credibility (MCC) trees of the whole-genome datasets.Divergence times (decimal years) for each event of interest are indicated on the tree nodes and posterior probability values are shown for the main clades.A) The MCC tree for the datasets without SARS-CoV-2 variants, and B) The MCC tree for the dataset with SARS-CoV-2 variants.https://doi.org/10.1371/journal.pone.0301195.g003

Fig 4 .Fig 5 .
Fig 4. Maximum clade credibility (MCC) trees of the gene S datasets.Divergence times (decimal years) for each event of interest are indicated on the tree nodes and posterior probability values are shown for the main clades.A) The MCC tree for the datasets without SARS-CoV-2 variants, and B) The MCC tree for the dataset with SARS-CoV-2 variants.https://doi.org/10.1371/journal.pone.0301195.g004